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Abstract 

TZ In this paper we propose a class of prior distributions on decomposable graphs, allowing for improved 

modeling flexibility. While existing methods solely penalize the number of edges, the proposed work 

(N 

^ empowers practitioners to control clustering, level of separation, and other features of the graph. Emphasis 



is placed on a particular prior distribution which derives its motivation from the class of product partition 
models; the properties of this prior relative to existing priors is examined through theory and simulation. 
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ly-s We then demonstrate the use of graphical models in the field of agriculture, showing how the proposed prior 

f-^ distribution alleviates the inflexibility of previous approaches in properly modeling the interactions between 

, , the yield of different crop varieties. Lastly, we explore American voting data, comparing the voting patterns 



amongst the states over the last century. 

1 Introduction 

This paper is concerned with the inference of the conditional independence graph Qofa muhivariate random 
vector Y of dimension n, a problem sometimes referred to as structure learning. We focus here on undirected 
decomposable graphs, whose popularity is mainly due to the tractable factorization they allow for the like- 
lihood (ll9]|20l); related work for directed graphical models can be found in lITSl . Learning the conditional 



independence graph Q is an onerous task due to the large number of graphs on a set of n nodes, or variables. 
It is possible using optimization methods to find the graph which best fits the data according to some met- 
ric II23I l30l [131 : alternatively Bayesian model averaging may be used to accommodate for uncertainty in the 
estimated graph, or maximum a posteriori estimation may be used to select a given model from the posterior 
over graphs. Such an approach relies on a prior distribution Tr{G) over the set of decomposable graphs of a 
given size; through Bayes theorem, this prior is updated based on the data to give an a posteriori estimate of 
the distribution over graphs. 

Current approaches have been limited in their ability to accommodate varying forms of prior information 
on the graph. For instance, in an effort to encourage interpretable graphs, the standard approach has been to 
penalize the number of edges (conditional dependencies) in the graph. However, many situations exist where 
one might expect variables to be clustered together and the graph to exhibit block structure. At the moment 
no such prior distribution exists to handle this problem. Our contribution in this article is to propose a class of 
prior distributions motivated from the class of product partition models which will allow improved flexibility 
in the specification of prior information on the graph. 

The field of agriculture is particularly suitable to the application of graphical models. Due to large spatial 
domains as well as multifarious crop varieties, it is valuable to have models which both handle the com- 
plexity of the biophysical process as well as allow straightforward interpretation. In particular, one might 
examine the set of zero/non-zero correlations between crop varieties' yields, using the presence or absence 
of edges to make decisions regarding crop management, marketing, and insurance policies. In addition, due 
to small sample sizes in many agricultural applications, the choice of prior distribution becomes particularly 
important. 

2 Bayesian Inference on Decomposable Graphs 

We begin with a brief overview of graphical models, following the exposition in ID; see also |20] for farther 
details on graphical models. Let Q = {ViE) be a graphical model with vertices V — {I,. . . ,n} and 
pairwise edges E. The pair of nodes {i,j} G 1^ are adjacent if (i, j) £ E, and a subset C C 1^ is said 
to be complete if all its elements are adjacent to each other. A complete subgraph that is maximal (i.e. not 



contained within another complete subgraph) is called a clique. An ordering of the cliques of an undirected 
graph, (Ci, . . . , CnJ is said to be perfect if the vertices of each clique Ci also contained in any previous 
clique Ci , . . . , Ci_i are all members of one previous clique; that is, for i = 2, 3, . . . , tt-c 

H, = Q n u}-i c, c Ch 

for some ft,e{l,2,...,i — 1}. The sets Hi, i — 1, . . . ,nc~l are called separators. We write Si, ... , Sn^ the 
non-empty separators (some might appear multiple times). If an undirected graph admits a perfect ordering 
it is said to be decomposable. 

We associate to each vertex i a random variable Yi. For A C V, let Ya — {Yi\i E A}. A distribution 
P over V is Markov with respect to Q if, for any decomposition {A, B) of Q, Xa is independent of Xb 
given Xahb- The widespread use of decomposable models is due to the resulting factorization of densities. 
Specifically, if P satisfies the conditional independencies implied by a decomposable graph Q, then the 
likelihood of the graphical model specified by P can be factorized according to the graph's cliques and 
separators 

where is a quantity parameterizing the graphical model P over the graph Q and satisfying some consistency 
conditions with respect to G (El). 

Traditionally, focus has been on Gaussian graphical models, also known as covariance selection models 
( ifTOll ) where P = A^„(/i, S) is a n-dimensional multivariate Gaussian distribution and 9 is the n x n covari- 
ance matrix S. Conditional independence structure is represented by the precision matrix E^^. If the edge 
{i,j) ^ E, then the variables Yi and Yj are conditionally independent given the remaining variables, and 
S7^ ^ = ^7\) = 0- As such, the Gaussian graphical model may be factorized as dll with the covariance S 
replacing 0, and the corresponding likelihood terms written as 

Piysl^s) = (27r)-l^l/2det(I]B)-l^l/'exp[-iir(5s(Si3)-')] (2) 

for each complete set B, where \B\ denotes the cardinality of B and Sb is the empirical covariance matrix 



of 2/B- 
FromaBayesianperspective, we are interested in the posterior distribution p(6'. Q\y) oc p{y\9, G)p{d\G)T^{G)- 
Much work has been dedicated to specifying proper priors p{9\G), see e.g. ( lfT5l l9l). The main focus of this 
paper is the specification of a prior distribution tt{G) over the space of decomposable graphs. As this space is 
very large compared to the number of observations, it is crucial to add as much prior information as possible 
on the structure of the unknown graph G- Moreover, we are generally interested in obtaining sparse graph 
estimates for needs of interpretation and prediction. Up until now, the specification of Tr{G) has been limited 
to the uniform distribution, or priors which penalize the complexity as measured by the number of edges. 
This brings us to the focus of this work, namely a class of prior distributions Tr{G) which subsumes control 
over the structure and features of G- 

3 Priors on Decomposable Graphs 

3.1 Previous work 

While early work on inference in decomposable models often assumed a uniform prior over graphs (i.e. ifTSl ). 
such priors put considerable mass on models of intermediate size. In an effort to put more weight on smaller 
graphs, several authors have proposed using a binomial prior distribution with parameter p on the number of 
edges r in the graph. This yields priors of the type ifTTIfTTll 

7r(g) (X p'-(l - p)"-^ (3) 

where m = 2^ is the maximal number of possible edges on n nodes. When p = 1/2, it reduces to the 
forementioned uniform prior over graphs. ifTTl suggest the use of p — 2/{n~l), motivated from the resulting 
density's peak at n edges in the unconstrained graph. Some authors also consider adding a hierarchical Beta 
prior p ^ Be (a, b) (d), giving the marginal prior on the graph as 

tt{G) = / Tr{G\pMp)dp (X — —- 

Jo I3[a,b) 



where /?(•, •) is the beta function. Q suggest a default choice of a = 5 = 1, implying a uniform prior on p. 
Interestingly, the resulting prior on Q is 
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which penahzes medium-sized graphs as desired. Such a prior weights each graph according to the number 
of graphs in the unrestricted space with the same number of edges. However, as shown by |2|, the space 
of decomposable graphs can be considerably different than the unrestricted space. To address this, [2 1 have 
proposed a uniform prior on decomposable graphs given the number of edges. However, calculating the 
number of decomposable graphs of a given size is an arduous task: there exists no list in the literature of 
decomposable graphs and their breakdown in terms of number of edges, nor are there straightforward ways 
of computing such quantities. As a result, ||2J proposes an MCMC estimation scheme, testing its accuracy up 
to 12 nodes, although such a scheme will likely become prohibitive in higher dimensions. 

While the priors in the above references allow one to control the size of the resulting graphs through the 
number of edges, often doing so results in undesirable graph structures, namely those with a high number of 
separators and long strings of nodes. Figure flltop) shows random samples from a binomial prior over 20- 
node graphs with p = 0.1 (closely echoing the choice of ifTTl . namely p = 2/(n — 1) w 0.1) and p — 0.5 (the 
uniform prior). We see from this plot that there is no clustering of the cliques, making interpretation difficult. 
In addition, the long strings/trees seen for p = 0.1 do not mesh with reality in most cases. Clearly such a 
class of priors is not suitable if one suspects clustering amongst the variables, clique sizes to be upper (or 
lower) bounded, or nearly full separation between cliques. Our focus therefore is on moving beyond priors 
which focus on the number of edges to priors which focus on graph (clique and separator) structure. 

3.2 A new prior distribution on decomposable graphs 

Motivated from the class of product partition models ( lfT6l l3ll4l). we consider prior distributions of the form 

nQ) OC T-rn, , (^ X (4) 

n,=iV's(s'j) 



binom(O.I) 




PGM(0.1, 0.001 




binom(0.5) 




Clique Sizes: 8(2), 7(3), 6(4), 5(2), 4(2) 
Separator Sizes: 7, 6(3), 5(4), 4(2), 3(2) 





Clique Sizes: 8, 7(3), 6(5), 5(3), 4 
Separator Sizes: 6(3), 5(5), 4(3), 3 




Clique Sizes: 8(3), 7, 6(5), 5(3), 4 
Separator Sizes: 7(2), 6, 5(5), 4(3), 3 



PGMd 0,0.001) 




® ® & ^ 



Clique Sizes; 4, 3, 2(3), 1(7) 
Separator Sizes: N/A 



® 



® 



Clique Sizes: 4, 3(2), 2 (3), 1(3) 
Separator Sizes: N/A 



© 




Figure 1: Four random samples from binomial and product graphical model (PGM) priors. Clique and 
separator sizes for each graph are also shown ("Clique Sizes: 2(3)" implies 3 cliques of size 2). 4 million 
samples were generated using Markov chain Monte Carlo, and every millionth is shown. While the binomial 
is characterized by large strings and many separators, the product graphical model allows one to induce 
clustering by setting b small. 



where ipc and Tps are respectively called the clique/separator cohesion functions, with the convention that 
V's(0) = 1- Evidently one could choose to penalize only cliques or separators by setting i/jc or ips to constant 
values. Alternatively, one could simply penalize clique sizes by setting ijjb = a\B\. Motivated from the class 
of product partition models, consider the cohesion functions V'c(-S) = a{\B\ — 1)\ and ips{B) = ^(|i3| — 1)!, 
a > 0, 6 > 0, hence 

""°'° " n-.(i^i-i)! ''' 

The factorial terms result in predilection towards large cliques and small separators - a desirable trait in 
terms of interpretability of the resulting graph. For instance, even if a = 6 = 1 with 20 nodes, the completely 
connected graph would be preferred over the complete independence graph by a factor of 20!. The parameters 
a and b respectively tune the number of cliques and separators in the decomposable graph. For a small, 
the prior will favour a small number of large cliques. Likewise for 6, with small values favouring fewer 
separators. Figure [T] (bottom) shows samples from this prior. Because of its relation to product partition 
models (described later), we term this prior the product graphical model prior. To clearly demonstrate the 
control the product graphical model prior (jsl) gives relative to the binomial prior, we set b = 1/1000, highly 
penalizing the number of separators and hence resulting in highly separated cliques. In addition, we look at 
two different values for a; a ~ 0.1, resulting in fewer and larger cliques, and a — 10, resulting in more (but 
smaller) cliques. Fig. [T] demonstrates the abihty of the prior to induce clustering of the cliques, and therefore 
sparsity in correlation. 

We have seen some general properties of the prior (jSl, namely the ability to control the number of cliques 
and separators. Figure [2] shows logiQ{'!T{Q) /n{Q'))) for different graphs Q, Q' . Specifically, decreases in 6 
result in increased prior probability on models with few separators; in addition we see that as a is increased, 
more mass is put on models with many cliques. In contrast, we also plot the same ratio for the binomial prior 
dSll. From this one can see the limited control such a prior gives, favouring small models in terms of number 
of edges, but putting very little mass on models, for example, which feature clusters of fully-connected nodes 
(as in t/g) and therefore have sparse covariance matrices. 

Selecting the appropriate cohesion functions in equation Q is a difficult problem, but one for which we 
may gain insight from the existing literature on product partition models (|l8l |27] |26l). For instance, one 
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Figure 2: Log ratio of priors over two graphs for product graphical model prior for various a, h (solid, bottom 
axis) and binomial prior for various p (dashed, top axis). While the binomial prior allows one to control the 
number of edges, for instance choosing Q\ over Q2, the same parameter would seldom choose ^^3 over Q^, 
despite Q-j, having a sparse covariance matrix, and Q^ having a saturated covariance matrix. 

may use Figure l2] to select a and h to best fit with prior intuition regarding the features of the graph, then 
verify the choice through generation of Monte Carlo samples from the prior as in Figure [T] Alternatively, 
cross-validation or related methods may be used to select a and 6; due to the potential computational cost of 
such methods, sequential Monte Carlo approaches may be used to speed up prior distribution selection (fSl). 
Given that the likelihood decomposes as ([T]i and the prior is of the form Q, the posterior will also be 
of the form (J4| with cohesions 4'c{Cj)piyCj) and ipsiSj)p{ySj)- The prior admits several other attractive 
properties and connections with well-known clustering methods as well. lft/js{Sj) — > 00 for all Sj ^ 0, then 
Equation Q reduces to the following model 



if n^ = and otherwise. The resulting prior puts only positive mass on graphs with no separators. It has 
been introduced as a prior over partitions by [16J and ||3]|4l under the name oi product partition models. In 



the particular case of (J4| with 6 — > 0, the prior over Q reduces to 

T(a + n) ^^ 

As shown by 1271 (see also 12611 ). this is the distribution over partitions induced by a Dirichlet Process lfT2l fn. 
We also have 

n— 1 n— 1 

E(»^c) = V — — ^ a log(l + n/a) + 7, var{nc) ^ V 7 — — r^ 
where 7 is Euler's constant and 

pr(T7,c = fc) = s(n, fc)a''r(a)/r(a + n) 

where the coefficients s(n, fc) are the absolute values of Stirling numbers of the first kind |[T]. In this limiting 
case, the number of cliques increases logarithmically with the number of nodes. 

3.3 Extensions 

Motivated by the larger class of exchangeable partition functions l24l[T9l . we can also consider four-parameters 
models, allowing more control over the relative sizes of the cliques/separators 

where a^ > — fli, < ai < 1, likewise for fei, 62. The above model reduces to ^ when ai = bi = 0. We 
can also consider models that control the maximal number of cliques/separators 



n{Q) ex 






where ci,C2,di,d2 > 0, and ci > di are the maximal number of cliques/separators. These two models 
respectively admit as limiting cases the distribution over partitions induced by the two-parameter Poisson- 
Dirichlet distribution and the finite Dirichlet-multinomial distribution, see e.g. lT9l for further details on 



these distributions. Using such extensions, one is able to both extend the product graphical model prior to 
control relative sizes and the maximal number of cliques and separators, as well as borrow from the wealth 
of literature on Dirichlet and related distributions to gain insight into the prior distribution's characteristics. 

4 Example: Modeling Agricultural Output of Different Species 

Determining agricultural policies to govern crop production, harvesting, and export is a challenge fraught with 
high variability both temporally and spatially. Enabling effective crop management, handling, and marketing, 
thus requires accurate understanding of crop yield that account for and explain these variations. While much 
effort has been made in developing models for predicting single crops ( Il28ll25 l). little effort has been made 
in understanding statistically the relationship between crop yield of different crop varieties. 

Understanding the connection between yields of different crop varieties is valuable for a multitude of 
reasons. Firstly, because certain crops are planted and harvested at different times, the management of one 
crop might benefit from knowledge obtained from harvesting a similar crop earlier in the year. Additionally, 
by accounting for correlation between different crops, insurers might better cover themselves against extreme 
events and better control insurance rates for farmers. Lastly, farmers themselves might wish to ensure some 
level of stability in their income, and therefore might prefer to plant crops which are uncorrected in yield. 
Through such a practice, a farmer would be proactive in preventing disasters across his entire crop portfolio. 
Simply by looking at the resulting undirected graph, a farmer could select two crops which do not have a path 
connecting them, and are therefore uncorrected. 

We examine the total production (in thousands of bushels) of 24 crops in the state of California from the 
years 1990 to 2009 (20 years). The data is compiled from the U.S Department of Agriculture website, where 
a considerable database is available for viewing and analysis. The 24 crops include, for example, several 
varieties of wheat, rice, and beans. We use the now-standard Gaussian hyper-inverse Wishart model: the 
likelihood of yield is given in ([T} and (|2]), and the prior for the covariance matrix S is hyper-inverse Wishart, 
which factorizes similarly to (fTJ, as a ratio of inverse Wishart distributions over cliques and separators (' lfT4l '). 
See Q for some alternative marginal likelihoods based on fractional Bayes factors which can help to induce 
parsimony. The parameters chosen for the hyper-inverse Wishart distribution are as described in |17|; we 
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Crop Varieties 

1: wheat (winter) 
2: wheat (durum) 
3: rice (long grain) 
4: rice (medium grain) 
5: rice (short grain) 
6: corn (grain) 
7: corn (silage) 
8: oats 
9: barley 

10: cotton (upland) 
1 1 : cotton (pima) 
12: beans (large lima) 
13: beans (baby lima) 
14: beans (light kidney) 
15: beans (dark kidney) 
16: beans (blackeye) 
17: beans (other) 
18: hay (alfalfa) 
19: hay (other) 
20: potatoes [winter) 
21: potatoes (spring) 
22: potatoes (summer) 
23: potatoes (fall) 
24: sweet potatoes 



Figure 3: Four samples with highest posterior probability from crop yield model using binomial and product 
graphical model (PGM) priors. We see the bean yields (nodes 12 through 17) seem to cluster together, as do 
summer and fall potatoes (nodes 22 and 23). We also observe that the product graphical model prior induces 
separated cliques, whereas the binomial prior results in long strings and trees of connected variables. As a 
result, the product graphical model prior will induce sparsity in the resulting posterior covariance. 

focus on the specification of Tr{Q). Looking at the list of crops, one would expect that there will be clustering 
of the yields according to crop characteristics. For instance, it would be reasonable to expect the yield of beans 
to be correlated with each other We also seek an interpretable graph, namely one with small complexity (in 
terms of number of edges and/or separators). The first such prior we examine is the binomial prior of ifTTJI 
with p = 2/{n — 1), chosen due to its prevalence in the literature. While such a prior allows for penaUzation 
on the number of edges, no control is available over clustering. In contrast, by using the prior (J5]l, we can set 
b = .01 to put strong penalization on the number of separators (and hence induce separation of the cliques 
and therefore sparsity in the correlation matrix), and set a = .01 to encourage a small number of cliques in 
the pursuit of simplicity in the resulting graph. 

We run MCMC of length 10 million over the space of decomposable graphs ( ifTsl ) for both the binomial 
and product graphical model priors, thinning to every 100 samples. With both priors, one may save computa- 
tional resources by making local moves, merging and splitting cliques within the Markov chain. As a result, 
one need not re-determine the structure of the entire graph at each move. Figure [3] shows the 4 graphs with 
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Table 1 : Log predictive density evaluated on test data using various priors 
Distribution: Binomial PGM 

Parameters: 2/(24-1) 0.5 (0.01,0.01) (0.1,0.1) (1,1) 

Avg. Log Predictive: -688 -707 -675 -677 -686 

Avg. Number of Edges: 17.8 29.6 18.1 16.4 20.3 



highest posterior probabilities from each prior. The product graphical model prior results in the top 4 graphs 
having posterior density values in the range 0.11 to 0.49, whereas for the binomial the range is 0.04 to 0.06, 
indicating that the binomial prior spreads mass much more evenly across distributions relative to the product 
graphical model prior with a = b = 0.01. Immediately evident from the figure is the different forms resulting 
from each prior. Specifically, the binomial prior induces long strings of nodes with many separators, whereas 
the product graphical model posterior reflects our prior beliefs that variables will cluster together, resulting in 
sparsity in the correlations between variables. A commercial farmer desiring to plant two plots with uncorre- 
cted crops to minimize the risk of loss might reach quite different conclusions from each prior Specifically, 
the large strings of nodes from the Binomial prior suggest correlation between the majority of crops. The 
farmer might not plant winter wheat (planted in late fall) and a strain of beans (harvested in early fall) on his 
two plots, despite their very different growing seasons, due to their connection in two of the highest posterior 
probability graphs in Figure [3] In contrast, the separation of cliques from the product graphical model prior 
Q would allow these crops to be planted together Such decisions could be made from the highest posterior 
graph, or by conducting Bayesian model averaging to obtain the expected utihty of a given decision. 

To gain an understanding of the product graphical model prior's prediction performance, we split the 
data into a training set (first 12 years) and testing set (last 8 years). After simulating from the posterior 
distribution arising from the binomial and product graphical model priors, we use Bayesian model averaging 
via the marginal likelihood evaluated on the test data to judge the model's prediction performance. We 
evaluate the resulting posterior predictive evaluated on the test set in Table [T] indeed, the product graphical 
model prior provides better prediction in this example, even over a variety of parameter choices. We also 
show the number of edges for each model, indicating that sparsity in terms of edges alone is not responsible 
for the improved prediction. 
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5 Example: Modeling 20th Century American Voting Patterns 

In an effort to demonstrate the product graphical model prior in higher dimensions, we now turn to the 
modeling of American voting data by state. For each federal election from 1904 to 1976, occurring every 
four years, we measure the proportion of votes for the republican party in each of the 50 states (|6|). Our 
goal is to model and visualize correlation in voting pattern changes over the last century. Some immediate 
questions come to mind: "Do certain states have an important role in determining election outcomes?", "Are 
there groups of states which vote together, operating independently from the US as a whole?" 

We proceed by exploring the posterior distribution resulting from the binomial prior with edge probability 
0.1, and the product graphical model prior with parameters a ^ 10,b ~ lO^'^, in an effort to make the overall 
number of edges resulting from each model comparable. Figurefflshows the two graphs with highest posterior 
density from each model. As expected, the binomial graphs contain long strings of variables, while the 
product graphical model prior demonstrates clustering and grouping of variables. While the binomial prior 
results in similar variables placed along the same string, the grouping from the product graphical model allows 
for clearer interpretation. For instance, we immediately observe that the southern states (SC, MS, LA, AL, 
GA, TX, VA, FL) generally vote in a group. Other patterns of interest also arise, including a close connection 
between AR, NC, and TN. Also, notice that NY and KS are consistently the single node connecting clusters 
of variables. As such, these states might be considered as key indicators of voting behavior. 

6 Discussion 

While we have focused on the Bayesian approach to covariance selection, significant work has also been 
done in a non-Bayesian framework. A common approach involves placing an £i penalization on the precision 
matrix E^^, which leads to sparse estimates ( 1231 [30l [131 ). Closer to the heart of this paper, ll22l examine the 
case of estimating Q when clustering is expected, and therefore E^^ exhibits block structure. However, these 
models are neither decomposable nor generative. 

While we have focused in this article on Gaussian graphical models, the prior defined in this article 
is far more general and can be used with any type of model for handling discrete or mixed data, see e.g. 
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(a) PGM(10, 10"''): HPD Graph 1 



(b) Binom(O.Ol): HPD Graph 1 





(c) PGM(10, 10-'='): HPD Graph 2 



(d) Binom(O.Ol); HPD Graph 2 



Figure 4: Voting example: two graphs with highest posterior density (HPD) from binomial and product 
graphical model priors. 
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Il2ni20l . We have also considered the hyperparameters a and b to be known constants. Estimating them 
within the MCMC sampler would require one to compute the normalizing constant in (|5]l, which is in general 
not tractable. An exception of interest is the case 6 — > 0, where we can assign a gamma prior to a and use the 
data augmentation algorithm described in ||29l to update a given the other variables. 

In conclusion, the proposed product graphical model prior improves flexibility in modeling decomposable 
graphical models and borrows strength from the immense literature on product partition and related models. 
The product graphical model prior allows one to encourage (or discourage) clustering of the graphs, and 
therefore can induce sparsity in the correlation matrix through clique separation; consequently, the product 
graphical model empowers practitioners to encapsulate their true prior beliefs to build a model more attuned 
to the problem at hand. 
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